Skip to content

[Feat] 모니터링 관련 설정 (Prometheus, Grafana, Loki)#197

Merged
hyomee2 merged 4 commits intodevelopfrom
feat/#195-monitoring
Mar 21, 2026
Merged

[Feat] 모니터링 관련 설정 (Prometheus, Grafana, Loki)#197
hyomee2 merged 4 commits intodevelopfrom
feat/#195-monitoring

Conversation

@hyomee2
Copy link
Copy Markdown
Collaborator

@hyomee2 hyomee2 commented Mar 19, 2026

Related issue 🛠

  • closed #

Work Description 📝

  • 모니터링 관련 설정 파일을 추가했습니다.

1. docker-compose-monitoring.yml

  • 모니터링 서버에 띄울 Prometheus, Grafana, Loki 컨테이너 설정을 해주었습니다.

2. prometheus.yml

  • 프로메테우스에서 애플리케이션 메트릭 수집을 위한 설정을 해주었습니다.
  • 우선 /actuator/prometheus 는 security 설정에서 인증없이 접근 가능하도록 했습니다.

3. loki/config.yml

  • Loki을 위한 기본 설정을 해주었습니다.

아직 Promtail 관련 설정은 안해주었고, PR 승인 후 대시보드에서 기본 모니터링 결과가 나오는지 확인하고 Promtail을 통해 애플케이션 서버에서 로그 수집 후 Loki로 전달하도록 구현할 예정입니다.

ScreenShots 📷

To Reviewers 📢

Summary by CodeRabbit

  • 새 기능

    • Prometheus, Grafana, Loki로 구성된 모니터링 스택 추가로 메트릭 수집·로그 집계·대시보드 시각화 가능
    • 애플리케이션의 health·info 및 Prometheus 엔드포인트 공개로 외부 모니터링 연동 지원
    • 관련 Docker Compose 및 구성 파일 추가로 로컬/배포형 모니터링 환경 지원
  • 설정

    • 애플리케이션에 Prometheus 수집을 위한 설정 및 접근 허용 추가

@hyomee2 hyomee2 requested review from eraser502 and jeong1112 and removed request for eraser502 March 19, 2026 16:27
@hyomee2 hyomee2 requested a review from eraser502 March 19, 2026 16:27
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 74c3b92a-7e8a-4511-89b4-114b01cf875f

📥 Commits

Reviewing files that changed from the base of the PR and between e9c345c and 9423dfe.

📒 Files selected for processing (1)
  • build.gradle
🚧 Files skipped from review as they are similar to previous changes (1)
  • build.gradle

📝 Walkthrough

Walkthrough

Prometheus/Grafana/Loki 기반 모니터링 스택을 추가합니다: Gradle에 Micrometer Prometheus 의존성을 추가하고, Docker Compose 및 Prometheus/Loki 설정 파일을 추가하며 Spring Boot 액추에이터 노출과 보안 허용 경로를 업데이트했습니다.

Changes

Cohort / File(s) Summary
빌드 의존성
build.gradle
implementation 'io.micrometer:micrometer-registry-prometheus' 의존성 추가(Prometheus용 Micrometer).
Docker 모니터링 인프라
deployment/docker-compose-monitoring.yml
Prometheus, Grafana, Loki 서비스를 정의하는 Docker Compose 파일 추가(이미지, 포트, 볼륨, 재시작 정책 포함).
Prometheus 구성
deployment/prometheus/prometheus.yml
글로벌 scrape/evaluation 간격 15s 설정 및 kareer_server 잡으로 https://api.ka-reer.com:443/actuator/prometheus 대상 정의.
Loki 구성
deployment/loki/config.yml
auth_enabled: false, HTTP 포트 3100, filesystem/tsdb_shipper 기반 저장소 경로 및 스키마 설정 추가.
애플리케이션 구성
src/main/resources/application.yml
관리 엔드포인트 노출 설정 추가(management.endpoints.web.exposure.include: health,info,prometheus 등) 및 management.endpoint.prometheus.enabled: true.
보안 구성
src/main/java/org/sopt/kareer/global/config/security/SecurityConfig.java
PERMIT_ALL_PATTERNS"/actuator/prometheus" 추가하여 Prometheus 엔드포인트를 인증 없이 허용.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(100,150,240,0.5)
    participant Client
    end
    rect rgba(50,200,100,0.5)
    participant App as KareerServer
    end
    rect rgba(240,200,60,0.5)
    participant Prom as Prometheus
    end
    rect rgba(220,120,200,0.5)
    participant Graf as Grafana
    end
    rect rgba(200,100,100,0.5)
    participant Loki
    end

    Client->>App: 요청 처리 (로그·메트릭 생성)
    App-->>Prom: /actuator/prometheus 노출(메트릭)
    Prom->>Prom: 메트릭 스크랩 및 저장
    Graf->>Prom: 쿼리 (대시보드 데이터)
    Client->>Graf: 대시보드 조회
    App-->>Loki: 로그 전송 (수집기/플루언트 경유)
    Loki->>Loki: 로그 인덱스 및 저장
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 깡충깡충, 모니터 불빛 아래
메트릭이 춤추고 로그가 속삭이네
프로메테우스는 숫자 세고
로키는 이야기 모아두고
토끼가 축하해요, 감시가 왔구나! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR 제목이 변경 사항의 주요 내용을 명확하게 설명합니다. 제목은 Prometheus, Grafana, Loki 모니터링 설정을 추가하는 것이 주요 변경 사항임을 잘 반영하고 있습니다.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#195-monitoring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
deployment/docker-compose-monitoring.yml (2)

31-33: depends_on에 health check 조건 추가 권장

현재 depends_on은 컨테이너 시작만 기다리며, 서비스가 실제로 준비되었는지 확인하지 않습니다. Prometheus나 Loki가 완전히 초기화되기 전에 Grafana가 연결을 시도하면 초기 오류가 발생할 수 있습니다.

♻️ health check 조건 추가 예시
   prometheus:
     image: prom/prometheus:latest
     ...
+    healthcheck:
+      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"]
+      interval: 10s
+      timeout: 5s
+      retries: 3

   grafana:
     ...
     depends_on:
-      - prometheus
-      - loki
+      prometheus:
+        condition: service_healthy
+      loki:
+        condition: service_healthy

   loki:
     image: grafana/loki:latest
     ...
+    healthcheck:
+      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3100/ready"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/docker-compose-monitoring.yml` around lines 31 - 33, The compose
uses a simple depends_on (the depends_on block with - prometheus and - loki)
which only waits for container start, not service readiness; add healthcheck
definitions to the prometheus and loki services (command, interval, timeout,
retries) and change Grafana's depends_on to use condition: service_healthy for
prometheus and loki so Grafana waits until those services pass their
healthchecks before starting; update the depends_on block and add corresponding
healthcheck blocks referenced by the service names prometheus and loki.

1-1: version 필드 제거 고려

Docker Compose V2에서 version 필드는 더 이상 필요하지 않으며 무시됩니다. 최신 Docker Compose에서는 이 필드를 제거하는 것이 권장됩니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/docker-compose-monitoring.yml` at line 1, Remove the top-level
"version" field from the compose file (the literal version: "3.8") because
modern Docker Compose v2 ignores it; locate the compose YAML containing the
version key and delete that single line so the file relies on the newer schema
(top-level keys like services, networks, volumes remain unchanged) and validate
the resulting docker-compose-monitoring.yml to ensure no syntax issues.
deployment/prometheus/prometheus.yml (1)

5-10: 스크래핑 타임아웃 및 TLS 설정 추가 권장

네트워크 지연이나 일시적인 장애 시 안정적인 스크래핑을 위해 scrape_timeout 설정을 추가하는 것이 좋습니다. 기본값은 scrape_interval과 동일하지만, 명시적으로 설정하면 관리가 용이합니다.

♻️ 권장 수정안
 global:
   scrape_interval: 15s
   evaluation_interval: 15s
+  scrape_timeout: 10s

 scrape_configs:
   - job_name: 'kareer_server'
     metrics_path: '/actuator/prometheus'
+    scheme: https
     static_configs:
-      - targets: ['api.ka-reer.com:443']  # HTTPS를 통해 접근
-    scheme: https
+      - targets: ['api.ka-reer.com:443']
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/prometheus/prometheus.yml` around lines 5 - 10, 현재 job_name
'kareer_server'의 scrape 설정에 scrape_timeout과 TLS 세부 설정이 없어 네트워크 지연/인증 문제에 취약합니다;
업데이트할 때 scrape_configs -> job_name: 'kareer_server' 블록에 scrape_timeout (예: "15s"
또는 서비스에 맞는 값)을 명시하고 scheme: https 아래에 tls_config을 추가하여 server_name 또는 ca_file을
지정하거나 필요시 insecure_skip_verify: false로 설정해 TLS 검증을 명시적으로 구성하세요 (metrics_path:
'/actuator/prometheus'와 targets: ['api.ka-reer.com:443']는 그대로 유지).
build.gradle (1)

89-90: 미사용 tess4j 의존성 제거 권장

tess4j는 코드베이스에서 전혀 사용되지 않습니다. 현재 Clova OCR 서비스를 사용 중이며, 어떤 Java 파일에서도 tess4j를 import하거나 사용하지 않습니다. 네이티브 라이브러리를 포함하고 있어 빌드 크기를 불필요하게 증가시키므로, 실제로 Tesseract 기반 OCR 구현이 필요할 때 추가하는 것을 권장합니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build.gradle` around lines 89 - 90, Remove the unused Tess4J dependency
declaration implementation 'net.sourceforge.tess4j:tess4j:5.13.0' from
build.gradle; search for any references to "tess4j" or imports of
net.sourceforge.tess4j in the codebase (classes/methods) and ensure none remain,
then run the build (e.g., ./gradlew assemble or dependency report) to confirm
the project still compiles and the dependency is no longer pulled in; if OCR is
needed later, re-add the dependency at that time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployment/docker-compose-monitoring.yml`:
- Line 9: Replace the use of floating :latest image tags with explicit
version-pinned tags to ensure reproducible deployments: locate the image string
"prom/prometheus:latest" and the other image entries referenced (the occurrences
matching the diff at the same image lines and the Loki image entries tied to
loki/config.yml) and change them to specific, tested version tags (e.g., a
concrete Prometheus and Loki release) and update any related compose references
so all three occurrences are pinned consistently; also add a short comment or
note near the image entries indicating the chosen version and source of truth
for future upgrades.
- Around line 22-34: The Grafana service currently exposes port 3000 with
default credentials; update the grafana service configuration to set a strong
admin password via environment variables (e.g., add GF_SECURITY_ADMIN_PASSWORD)
and disable anonymous access by setting GF_AUTH_ANONYMOUS_ENABLED=false; ensure
these env vars are added to the grafana service block and consider restricting
external exposure (remove or limit the "3000:3000" port mapping or bind it to
localhost) so the grafana service is no longer accessible with default
admin/admin credentials or anonymously.

In `@deployment/loki/config.yml`:
- Line 1: Summary: auth_enabled is set to false leaving Loki open to
unauthenticated Promtail pushes and docker-compose-monitoring.yml binds port
3100 to the host. Fix: set auth_enabled: true in deployment/loki/config.yml (or
document/automate secure credentials) and update docker-compose-monitoring.yml
to avoid public host binding (use internal network only or bind to
127.0.0.1:3100:3100) and ensure firewall rules block external access; reference
the auth_enabled key, Promtail clients, and the ports mapping "3100:3100" when
making the changes.
- Around line 6-20: The Loki config uses storage_config with boltdb and
schema_config setting store: boltdb and schema: v11 which is incompatible with
Loki 3.x; update the config to use the tsdb store and a supported schema (e.g.,
change store: boltdb -> store: tsdb and schema: v12 or newer) and adjust
storage_config to match tsdb expectations (replace or remove boltdb-specific
keys and ensure filesystem/chunks layout remains valid), or alternatively pin
the Docker image in docker-compose-monitoring.yml to a Loki 2.x tag to keep the
existing boltdb/schema: v11 settings.

In `@src/main/java/org/sopt/kareer/global/config/security/SecurityConfig.java`:
- Line 43: The /actuator/prometheus entry currently listed publicly in
SecurityConfig should not be exposed without protection; update SecurityConfig
(e.g., the configure(HttpSecurity http) or WebSecurityCustomizer setup in
SecurityConfig.java) to restrict access to the "/actuator/prometheus"
endpoint—either require authentication/authority (add a requestMatcher for
"/actuator/prometheus" and .authenticated() or a specific role) and enable HTTP
Basic for the actuator endpoints, or implement an IP whitelist by adding a
custom WebExpression (hasIpAddress(...)) or a filter that allows only Prometheus
scrape IPs; ensure the change is applied where the antMatchers/requestMatchers
are configured so the prometheus endpoint is no longer permitAll.

In `@src/main/resources/application.yml`:
- Around line 103-104: application.yml currently exposes the info endpoint but
SecurityConfig.java's PERMIT_ALL_PATTERNS does not include /actuator/info,
causing unauthenticated requests to get 401/403; either remove "info" from
management.endpoints.web.exposure.include in application.yml if you want it
protected, or add "/actuator/info" (and optionally "/actuator/info/**") to the
PERMIT_ALL_PATTERNS constant in SecurityConfig (and update any related security
matcher logic) so /actuator/info is publicly accessible; choose one of these two
actions and make the corresponding change consistently.
- Around line 108-109: The health endpoint is currently exposing sensitive infra
details via management.endpoint.health.show-details: always while
/actuator/health is in PERMIT_ALL_PATTERNS; change
management.endpoint.health.show-details to either when-authorized (preferred) or
never in application.yml and ensure your security config no longer leaves
/actuator/health publicly permitted (remove it from PERMIT_ALL_PATTERNS or add
an authorization rule) so that detailed health info is only returned to
authenticated/authorized principals; update any related config or docs
referencing /actuator/health accordingly.

---

Nitpick comments:
In `@build.gradle`:
- Around line 89-90: Remove the unused Tess4J dependency declaration
implementation 'net.sourceforge.tess4j:tess4j:5.13.0' from build.gradle; search
for any references to "tess4j" or imports of net.sourceforge.tess4j in the
codebase (classes/methods) and ensure none remain, then run the build (e.g.,
./gradlew assemble or dependency report) to confirm the project still compiles
and the dependency is no longer pulled in; if OCR is needed later, re-add the
dependency at that time.

In `@deployment/docker-compose-monitoring.yml`:
- Around line 31-33: The compose uses a simple depends_on (the depends_on block
with - prometheus and - loki) which only waits for container start, not service
readiness; add healthcheck definitions to the prometheus and loki services
(command, interval, timeout, retries) and change Grafana's depends_on to use
condition: service_healthy for prometheus and loki so Grafana waits until those
services pass their healthchecks before starting; update the depends_on block
and add corresponding healthcheck blocks referenced by the service names
prometheus and loki.
- Line 1: Remove the top-level "version" field from the compose file (the
literal version: "3.8") because modern Docker Compose v2 ignores it; locate the
compose YAML containing the version key and delete that single line so the file
relies on the newer schema (top-level keys like services, networks, volumes
remain unchanged) and validate the resulting docker-compose-monitoring.yml to
ensure no syntax issues.

In `@deployment/prometheus/prometheus.yml`:
- Around line 5-10: 현재 job_name 'kareer_server'의 scrape 설정에 scrape_timeout과 TLS
세부 설정이 없어 네트워크 지연/인증 문제에 취약합니다; 업데이트할 때 scrape_configs -> job_name:
'kareer_server' 블록에 scrape_timeout (예: "15s" 또는 서비스에 맞는 값)을 명시하고 scheme: https
아래에 tls_config을 추가하여 server_name 또는 ca_file을 지정하거나 필요시 insecure_skip_verify:
false로 설정해 TLS 검증을 명시적으로 구성하세요 (metrics_path: '/actuator/prometheus'와 targets:
['api.ka-reer.com:443']는 그대로 유지).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6aa87b94-7751-4c4d-b85a-057150a12936

📥 Commits

Reviewing files that changed from the base of the PR and between c9acb2b and 629dfd4.

📒 Files selected for processing (6)
  • build.gradle
  • deployment/docker-compose-monitoring.yml
  • deployment/loki/config.yml
  • deployment/prometheus/prometheus.yml
  • src/main/java/org/sopt/kareer/global/config/security/SecurityConfig.java
  • src/main/resources/application.yml

Comment thread deployment/docker-compose-monitoring.yml Outdated
Comment thread deployment/docker-compose-monitoring.yml
Comment thread deployment/loki/config.yml
Comment thread deployment/loki/config.yml
Comment thread src/main/resources/application.yml
Comment thread src/main/resources/application.yml
Copy link
Copy Markdown
Collaborator

@jeong1112 jeong1112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread build.gradle Outdated
@hyomee2 hyomee2 merged commit c427a93 into develop Mar 21, 2026
2 checks passed
@hyomee2 hyomee2 deleted the feat/#195-monitoring branch March 21, 2026 07:07
@hyomee2 hyomee2 changed the title [Feat] #195 모니터링 관련 설정 (Prometheus, Grafana, Loki) [Feat] 모니터링 관련 설정 (Prometheus, Grafana, Loki) Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants